-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply the cuFile error work around to data_sink as well #15335
Apply the cuFile error work around to data_sink as well #15335
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CUDA API call in a global instance may be a CUDA violation.
…bug-cuda-ctx-always-global-init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for doing this.
// Workaround for https://github.com/rapidsai/cudf/issues/14140, where cuFileDriverOpen errors | ||
// out if no CUDA calls have been made before it. This is a no-op if the CUDA context is already | ||
// initialized | ||
cudaFree(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we hard code 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we hard code
0
?
This initializes the context without actually doing anything.
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ga042655cbbf3408f01061652a075e094
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing. Maybe the CUDA doc is not accurate? If the input parameter is a pointer, we should pass in nullptr
instead of 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, cudaFree(0)
is the idiomatic way to do this. nullptr would be a better option in general, but it looks like cudaFree(0)
is recognized as the idiom. I haven't seen a single reference where cudaFree(nullptr)
is used.
Edit: google search for "cudaFree(nullptr)": 121 hits vs cudaFree(0) 1220 hits 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the idiom that I am familiar with as well.
/merge |
Description
Issue #14140
Follow-up on #15293
Moving the
cudaFree(0)
call to a function called both by filedatasource
anddata_sink
.Checklist